Filled-pause Modeling for Medical Transcriptions

نویسندگان

  • Hauke Schramm
  • Xavier L. Aubert
  • Carsten Meyer
  • Jochen Peters
چکیده

We present our recent progress in filled pause (FP) modeling for a highly spontaneous medical transcription task. Our studies confirm that FP modeling is an important topic for spontaneous speech applications, which must be explicitly addressed in acoustic, lexical, and language modeling. We provide a framework for datadriven lexical modeling of FP acoustic variability with respect to phonemic realization and duration. By using a number of properly weighted FP pronunciation variants of variable lengths and applying specific acoustic models for FP, we achieved an 8% relative reduction of the word error rate. We also tested different approaches for handling FP in the language model and integrating FP into the decoder. Best results with respect to both perplexity and word error rate have been achieved by predicting FP probabilistically and removing it from the language model history. This approach reduces the perplexity by 4% and provides a further gain in word accuracy.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Pronunciation Variants Modeling in Korean Spontaneous Speech Recognition

Pronunciation variants in spontaneous speech tend to be more variable in planned speech. Spontaneous speech has significant sources of variations as well as serious phonological variations, which make recognition extremely difficult. In this paper, we analyzed the auditory transcriptions of the dialogue for spontaneous speech recognition, and then classified the characteristics of conversationa...

متن کامل

Filled Pause Modeling

This document presents a streamlined approach to modeling filled pause distribution in spontaneous speech and populating a large clean corpus, making use of only the SRILM toolkit and a small training set. Although used for filled pause modeling, it can be fairly general and may be used to model other types of disfluencies, punctuation or sentence boundaries, with a minimal set of changes.

متن کامل

Acoustic Feature Analysis and Discriminative Modeling of Filled Pauses for Spontaneous Speech Recognition

Most automatic speech recognizers (ASRs) concentrate on read speech, which is different from spontaneous speech with disfluencies. ASRs cannot deal with speech with a high rate of disfluencies such as filled pauses, repetitions, lengthening, repairs, false starts and silence pauses. In this paper, we focus on the feature analysis and modeling of the filled pauses “ah,” “ung,” “um,” “em,” and “h...

متن کامل

Prosodic Cues and Answer Type Detection for the Deception Sub-Challenge

Deception is a deliberate act to deceive interlocutor by transmitting a message containing false or misleading information. Detection of deception consists in the search for reliable differences between liars and truth-tellers. In this paper, we used the Deceptive Speech Database (DSD) provided for the Deception sub-challenge. DSD consists of deceptive and non-deceptive answers to a set of unkn...

متن کامل

Filled Pause Refinement Based on the Pronunciation Probability for Lecture Speech

Nowadays, although automatic speech recognition has become quite proficient in recognizing or transcribing well-prepared fluent speech, the transcription of speech that contains many disfluencies remains problematic, such as spontaneous conversational and lecture speech. Filled pauses (FPs) are the most frequently occurring disfluencies in this type of speech. Most recent studies have shown tha...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003